Explain Images with Multimodal Recurrent Neural Networks

نویسندگان

Junhua Mao

Wei Xu

Yi Yang

Jiang Wang

Alan L. Yuille

چکیده

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional networ...

متن کامل

Robust stability of stochastic fuzzy impulsive recurrent neural networks with\ time-varying delays

In this paper, global robust stability of stochastic impulsive recurrent neural networks with time-varyingdelays which are represented by the Takagi-Sugeno (T-S) fuzzy models is considered. A novel Linear Matrix Inequality (LMI)-based stability criterion is obtained by using Lyapunov functional theory to guarantee the asymptotic stability of uncertain fuzzy stochastic impulsive recurrent neural...

متن کامل

WMT 2016 Multimodal Translation System Description based on Bidirectional Recurrent Neural Networks with Double-Embeddings

Bidirectional Recurrent Neural Networks (BiRNNs) haveshown outstanding results on sequence-to-sequence learning tasks. This architecture becomes specially interesting for multimodal machine translation task, since BiRNNs can deal with images and text. On most translation systems the same word embedding is fed to both BiRNN units. In this paper, we present several experiments to enhance a baseli...

متن کامل

Image Caption Generation with Recursive Neural Networks

The ability to recognize image features and generate accurate, syntactically reasonable text descriptions is important for many tasks in computer vision. Auto-captioning could, for example, be used to provide descriptions of website content, or to generate frame-by-frame descriptions of video for the vision-impaired. In this project, a multimodal architecture for generating image captions is ex...

متن کامل

Image Backlight Compensation Using Recurrent Functional Neural Fuzzy Networks Based on Modified Differential Evolution

In this study, an image backlight compensation method using adaptive luminance modification is proposed for efficiently obtaining clear images.The proposed method combines the fuzzy C-means clustering method, a recurrent functional neural fuzzy network (RFNFN), and a modified differential evolution.The proposed RFNFN is based on the two backlight factors that can accurately detect the compensat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1410.1090 شماره

صفحات -

تاریخ انتشار 2014

Explain Images with Multimodal Recurrent Neural Networks

نویسندگان

چکیده

منابع مشابه

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Robust stability of stochastic fuzzy impulsive recurrent neural networks with\ time-varying delays

WMT 2016 Multimodal Translation System Description based on Bidirectional Recurrent Neural Networks with Double-Embeddings

Image Caption Generation with Recursive Neural Networks

Image Backlight Compensation Using Recurrent Functional Neural Fuzzy Networks Based on Modified Differential Evolution

عنوان ژورنال:

اشتراک گذاری